7 research outputs found

    To Index or Not to Index: Optimizing Exact Maximum Inner Product Search

    Full text link
    Exact Maximum Inner Product Search (MIPS) is an important task that is widely pertinent to recommender systems and high-dimensional similarity search. The brute-force approach to solving exact MIPS is computationally expensive, thus spurring recent development of novel indexes and pruning techniques for this task. In this paper, we show that a hardware-efficient brute-force approach, blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers by over an order of magnitude, for some -- but not all -- inputs. In this paper, we also present a novel MIPS solution, MAXIMUS, that takes advantage of hardware efficiency and pruning of the search space. Like BMM, MAXIMUS is faster than other solvers by up to an order of magnitude, but again only for some inputs. Since no single solution offers the best runtime performance for all inputs, we introduce a new data-dependent optimizer, OPTIMUS, that selects online with minimal overhead the best MIPS solver for a given input. Together, OPTIMUS and MAXIMUS outperform state-of-the-art MIPS solvers by 3.2×\times on average, and up to 10.9×\times, on widely studied MIPS datasets.Comment: 12 pages, 8 figures, 2 table

    Complaint-driven Training Data Debugging for Query 2.0

    Full text link
    As the need for machine learning (ML) increases rapidly across all industry sectors, there is a significant interest among commercial database providers to support "Query 2.0", which integrates model inference into SQL queries. Debugging Query 2.0 is very challenging since an unexpected query result may be caused by the bugs in training data (e.g., wrong labels, corrupted features). In response, we propose Rain, a complaint-driven training data debugging system. Rain allows users to specify complaints over the query's intermediate or final output, and aims to return a minimum set of training examples so that if they were removed, the complaints would be resolved. To the best of our knowledge, we are the first to study this problem. A naive solution requires retraining an exponential number of ML models. We propose two novel heuristic approaches based on influence functions which both require linear retraining steps. We provide an in-depth analytical and empirical analysis of the two approaches and conduct extensive experiments to evaluate their effectiveness using four real-world datasets. Results show that Rain achieves the highest recall@k among all the baselines while still returns results interactively.Comment: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Dat

    Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries

    Get PDF
    Abstract Background Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres. Methods This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries. Results In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia. Conclusion This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries

    MacroBase: Prioritizing Attention in Fast Data

    No full text
    © 2018 Association for Computing Machinery. As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables eficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation (i.e., feature selection) and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles

    Machine Learned Cellular Phenotypes in Cardiomyopathy Predict Sudden Death

    No full text
    RATIONALE: Susceptibility to ventricular arrhythmias (VT/VF) is difficult to predict in patients with ischemic cardiomyopathy either by clinical tools or by attempting to translate cellular mechanisms to the bedside. OBJECTIVE: To develop computational phenotypes of patients with ischemic cardiomyopathy, by training then interpreting machine learning (ML) of ventricular monophasic action potentials (MAPs) to reveal phenotypes that predict long-term outcomes. METHODS AND RESULTS: We recorded 5706 ventricular MAPs in 42 patients with coronary disease (CAD) and left ventricular ejection fraction (LVEF) ≀40% during steady-state pacing. Patients were randomly allocated to independent training and testing cohorts in a 70:30 ratio, repeated K=10 fold. Support vector machines (SVM) and convolutional neural networks (CNN) were trained to 2 endpoints: (i) sustained VT/VF or (ii) mortality at 3 years. SVM provided superior classification. For patient-level predictions, we computed personalized MAP scores as the proportion of MAP beats predicting each endpoint. Patient-level predictions in independent test cohorts yielded c-statistics of 0.90 for sustained VT/VF (95% CI: 0.76-1.00) and 0.91 for mortality (95% CI: 0.83-1.00) and were the most significant multivariate predictors. Interpreting trained SVM revealed MAP morphologies that, using in silico modeling, revealed higher L-type calcium current or sodium calcium exchanger as predominant phenotypes for VT/VF. CONCLUSION: Machine learning of action potential recordings in patients revealed novel phenotypes for long-term outcomes in ischemic cardiomyopathy. Such computational phenotypes provide an approach which may reveal cellular mechanisms for clinical outcomes and could be applied to other conditions
    corecore